Cost-aware view materialization for highly distributed datasets

نویسندگان

  • Justin Cappos
  • Austin Donnelly
  • Richard Mortier
  • Dushyanth Narayanan
  • Antony Rowstron
چکیده

Querying large datasets distributed over thousands of endsystems is a challenge for existing distributed querying infrastructures. High data availability requires either replicating or centralizing the dataset but both require infeasibly high network bandwidth. In-situ querying provides low bandwidth overheads but requires users to tolerate low data availability. This paper advocates partial data replication, increasing the availability of a subset of the data through centralization and/or in-network (peer-to-peer) replication. This is analogous to materializing views in centralized databases, but where materialized views in centralized databases trade view update overheads for query overheads, in the distributed case they trade bandwidth usage for availability. Given an example workload, state-of-the-art tools for centralized databases are able to determine a set of materialized views that will improve performance. Key to this is the ability to estimate view maintenance costs with different hypothetical materialized views. This paper describes estimation of view maintenance costs in a highly distributed database. We present metrics that capture the cost of different materializations, and show that we can estimate these metrics accurately, efficiently, and scalably on a real distributed dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approach for Selection and Maintenance of Materialized View in Data Warehousing

Quick response time and accuracy are important factors in the success of any database. In large databases particularly in distributed database, query response time plays an important role as timely access to information and it is the basic requirement of successful business application. A data warehouse uses multiple materialized views to efficiently process a given set of queries. The material...

متن کامل

Optimization for iterative queries on MapReduce

We propose OptIQ, a query optimization approach for iterative queries in distributed environment. OptIQ removes redundant computations among different iterations by extending the traditional techniques of view materialization and incremental view evaluation. First, OptIQ decomposes iterative queries into invariant and variant views, and materializes the former view. Redundant computations are r...

متن کامل

M Aterialization Is a Vailable

The role of materialized views is becoming vital in today’s distributed Data warehouses. Materialization is where parts of the data cube are pre-computed. Some of the real time distributed architectures are maintaining materialization transparencies in the sense the users are not known with the materialization at a node. Usually what all followed by them is a cache maintenance mechanism where t...

متن کامل

Caching and Materialization for Web Databases

Database systems have been driving dynamic websites since the early 1990s; nowadays, even seemingly static websites employ a database back-end for personalization and advertising purposes. In order to keep up with the high demand fuelled by the rapid growth of the Internet, a number of caching and materialization techniques have been proposed for web databases over the years. The main goal of t...

متن کامل

An Efficient Materialized View Selection Approach for Query Processing in Database Management

Quick response time and accuracy are important factors in the success of any database. In large databases particularly in distributed database, query response time plays an important role as timely access to information and it is the basic requirement of successful business application. A data warehouse uses multiple materialized views to efficiently process a given set of queries. The material...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007